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ABSTRACT 

The 21-cm PDF (i.e., distribution of pixel brightness temperatures) is expected to be 
highly non-Gaussian during reionization and to provide important information on the 
distribution of density and ionization. We measure the 21-cm PDF as a function of 
redshift in a large simulation of cosmic reionization and propose a simple empirical 
fit. Guided by the simulated PDF, we then carry out a maximum likelihood analysis 
of the ability of upcoming experiments to measure the shape of the 21-cm PDF and 
derive from it the cosmic reionization history. Under the strongest assumptions, we 
find that upcoming experiments can measure the reionization history in the mid to 
late stages of reionization to 1 — 10% accuracy. Under a more flexible approach that 
allows for four free parameters at each redshift, a similar accuracy requires the lower 
noise levels of second-generation 21-cm experiments. 
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1 INTRODUCTION 

The earliest generations of stars are thought to have trans- 
formed the universe from darkness to light and to have reion- 
ized and heated the intergalactic medium (jBarkana fc Loebl 
l200lh . Knowing how the reionization process happened is 
a primary goal of cosmologists, because this would tell us 
when the early stars formed and in what kinds of galaxies. 
The clustering of these galaxies is particularly interesting 
since it is dr iven by large-scale den sity fluctuations in the 
dark matter (|Barkana fe Loeal2004l ) . While the distribution 
of neutral hydrogen during reionization can in principle be 
measured from maps of 21-cm emission by neutral hydro- 
gen, upcoming experiments are expected to be able to de- 
tect ionization fluctuations only statistically (for re views see, 
e.g.. lFurlanetto et allkoOfj ; iBarkana fc Loebll2007l '). Current 
observational efforts include the Murchison Widefield Ar- 
ray (MWA, www.haystack.mit.edu/ast/arrays/mwa/), the 
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Low Frequency Array (www.lofar.org), the Giant Me- 
trewave Radio Telescope (gmrt.ncra.tifr.res.in), and the 
Precision Array to Probe the Epoch of Reionization 
(astro.berkeley.edu/~dbacker/eor/). 

Studies of statistics of the 21-cm fluctuations have 
focused on the two-point correlation function (or power 
spectrum) of the 21-cm brightness temperature. This is 
true both for analytical and numerical studies and anal- 
yses of the expected sensitivity o f the new experiment s 
l|Bowman. Morales, fc Hewittl l200rj ; iMcQuinn et~ail 120061 ). 
The power spectrum is the natural statistic at very high red- 
shifts, as it contains all the available statistical information 
as long as Gaussian primordial density fluctuations drive the 
21-cm fluctuations. More generally, the power spectrum is 
also closely related to the directly observed radio visibili- 
ties. Now, during reionization the hydrogen distribution is a 
highly non-linear function of the distribution of the underly- 
ing ionizing sources. This follows most simply from the fact 
that the H I fraction is constrained to vary between and 
1, and this range is fully covered in any scenario driven by 



2 Ichikawa, Barkana, Iliev, Mellema, Shapiro 



stars, in which the intergalactic medium is sharply divided 
between H I and H II reg ions. The resulting non-Gaussianity 
ijBharadwai fc Alfo oS) raises the possibility of using com- 
plementary statistics to measuring additional information 
that is not directly derivable from the power s pectrum (e.g., 
iFurlanetto et al.ll2004l : [Saivad-Ali et al.ll2006l ). 

Numerical simulations have recently begun to reach the 
large scales (of order 100 Mpc) needed to capture the evolu- 
tion of the interga l actic medium (IGM) during reionization 
( Ilicv et al. I l2006bl ; iMellema et al 1 l2006bl : IZahn et all I2OO7I : 
ISantos et al.ll20o"8h . These simulations account accurately for 
gravitational evolution and the radiative transfer of ionizing 
photons, but still crudely for gas dynamic s and star forma- 
tion. Analytically, IFurlanetto et aL l|2004h used the statis- 
tics of a random walk with a linear barrier to model the 
H II bubble size distribution during the reionization epoch. 
Schematic approxim ations were developed for the two-point 
corre lation function iFurlanetto et aT]|2004l ; iMcQuinn et alj 
2005), but recently iBarkanal developed an accurate, 

self-consistent analytic al expression for the fu ll two-point 
distribution within the IFurlanetto et aL I l|2004 ) model, and 
in particular used it to calculate the 21-cm correlation func- 
tion. 

Noting the expected non -Gaussianity and the i mpor- 
tance of additional statistics, IFurlanetto et al] l|2004h also 
calculated the one-point probability distribution function 
(PDF) of the 21-cm brightness temperature at a point. The 
PDF has begun to be explored in numerical simu lations as 
well (jCiardi fc Madaull2003l : IMellema et a!] |2006b). Some of 
the additional inform ation available in the PDF can be cap- 
tured by its skewness (|Wvithe fc Morales l l2007l ; lHarker et all 



l2009MBarkana fc LoebT ( 2008) have also considered the dif- 
ference PDF, a two-dimensional function that generalizes 
both the one-point PDF and the correlation function and 
yields additional information beyond those statistics. 



Recently, lOh et ail (|2009l ) have quantitatively consid- 
ered the ability of upcoming experiments to determine the 
cosmic reionization history from maximum likelihood fitting 
of the 21-cm PDF. They specifically used mixture modeling 
of the PDF. In this paper we develop a method for statistical 
analysis of the PDF that is simpler and more efficient (allow- 
ing, in particular, binning of the PDF). We use our method 
to present a quantitative analysis of whether upcoming and 
future experiments can measure the detailed shape of the 
21-cm PDF and derive from it the cosmic reionization his- 
tory. In section 2 we develop our basic statistical method for 
fitting the 21-cm PDF, and test it on a simple toy model for 
the PDF. We then measure and follow the evolution of the 
PDF in a large N-body and radiative transfer simulation of 
cosmic reionization; since previous analytical models of the 
PDF differ qualitatively from the PDF in the simulation, 
here we simply fit the simulated PDF with an empirical, 
four-parameter model (section 3). Finally, we present the 
expected accuracy of reconstructing the 21-cm PDF and the 
cosmic reionization history based on the simulated PDF, ei- 
ther with strict assumptions that lead to one free parameter 
at each redshift (section 4), or with a more flexible approach 
that allows for four free parameters (section 5). We summa- 
rize our conclusions in section 6. 



2 BASIC METHOD 

In this section we develop our basic statistical method for 
fitting the PDF. While the statistical approach is general, for 
concreteness we develop it within the context of a simple toy 
model for the PDF. We also use this toy, double-Gaussian 
model in order to get a crude quantitative intuition on how 
hard it is to measu re the 21- c m PD F. We note that we fol- 
low to some degree lOh et al.l (|2009l ). who considered such a 
double-Gaussian toy model and made a signal-to-noise study 
of this model with their analysis method. 



2.1 A Toy Model for the PDF 

It is useful to have a simple PDF example on which to de- 
velop and test our methods. We present here a simplified 
toy model that captures the main qualitative features of 
the PDF as seen in the simulations (and shown later in 
the paper) during the central stage of reionization, when 
the cosmic ionization fraction Xi ~ 0.3 — 0.6. The PDF 
at this stage has a sharp peak at a differential brightness 
temperature (defined as the difference between the actual 
brightness temperature and the temperature of the cosmic 
microwave background at the same frequency) of Tb = mK 
corresponding to fully ionized pixels, and another peak at 
Tb ~ 20 mK corresponding to mostly neutral pixels, with 
a rapidly declining probability at values above 20 mK, and 
a smooth probability density in between the peaks that is 
lower than the height of either peak. In the observations, 
this physical PDF is convolved with a broad Gaussian cor- 
responding to the thermal noise, resulting in both positive 
and negative values of Tb- In the limit when we approxi- 
mate both peaks as delta functions and neglect the physical 
PDF at other points, the observed PDF becomes a sum of 
two Gaussians with equal standard deviations a. While cer- 
tainly highly simplified, this model does capture the main 
question (relevant especially for low signal-to-noise data, i.e., 
when a 2> 20 mK) of whether it is at all possible to tell apart 
the two peaks and not confuse them with a convolved single 
peak (i.e., a single Gaussian). 

Thus, we consider two Gaussian distributions with 
equal standard deviation a (where a represents the mea- 
surement noise level). In the toy model we use a dimension- 
less s as the dependent variable (which represents Tb in the 
real PDF). The Gaussian representing the reionized pixels 
is centered at s = 0, while the neutral pixels are represented 
by a Gaussian centered at s — sq- The fraction of the total 
probability contained in the first Gaussian is a. The total 
distribution is therefore 



p(s) = aG(s, a) + (1 — a)G(s — sg, a) 



where 
G(x,a) 



exp(— x 2 /2a 2 ) 



27TCT 



(1) 



(2) 



Since in the real case only differences in Tb can be mea- 
sured, and not the absolute Tb (which is dominated by fore- 
grounds), in the toy model we assume that the absolute s 
cannot be measured. A simple practical way to do this is to 
always measure s with respect to its average value according 
to the PDF of s; we do this separately in each model and in 
each simulated data set, and thus only compare the relative 
distributions between each model and each data set. 
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2.2 Maximum Likelihood and the C-Statistic 



2.3 Results for the Toy Model 



In this subsection we develop our basic statistical method 
for fitting the PDF, referring to the above toy model as an 
example for the PDF. In general, we can create mock data 
sets by randomly generating N p values of s from a given 
p(s) distribution, and we can then try to estimate the best- 
fitting parameters with a maximum likelihood method. For 
a given mock observed PDF, as given by the N p generated 
values of s, we wish to find the best-fitting model PDF p(s) 
by maximizing the likelihood C that the N p values Si (i = 1, 
2, . . . , N p ) came from p(s). Since the different values are 
independent, this probability (apart from fixed As factors) 
is simply 



For the toy, double-Gaussian model, the parameters we wish 
to fit to mock data sets are sg and a. Note that we assume 
that a is known, as we expect that the level of thermal noise 
per pixel will be known in the 21-cm experiments, given 
the known array properties and the measured foreground 
level. We perform 1000 Monte Carlo for each input model, 
and thus obtain the full distribution of reconstructed model 
parameters. In order to develop intuition on how hard it is 
to measure the PDF, we define a parameter rj that captures 
a simplistic notion of the total signal-to-noise ratio: 



]>(*) 



(3) 



(7) 



Now, it is standard to replace the problem of maximizing 
the likelihood C with a minimization of — 21n£, which in 
this case is 



21n£ = -2 V"lnp(si) 



(4) 



In comparing the data to a potential model, we bin the 
values of s in order to have a manageable number of bins 
(Nb = 1000) even when N p is very large. This is justified as 
long as the bin width is much smaller than any s-scale that 
we hope to resolve in the PDF. We have explicitly checked 
that using Nb = 1000 bins (with the C-statistic, see below) 
gives the same results as applying equation Q directly, even 
for the largest values of N p that we use in this paper. Now, 
when the expected (according to a model p(s)) number of 
points n eX pj' in each bin j is large (i.e., n cxp ,j 3> 1), the 
actual number rij has a standard error of ^/n cxpj ', and we 
can find the best-fitting model by minimizing a standard x 
statistic: 



N B 

77c Xp ,j 

3=1 



(5) 



However, in modeling the PDF we often wish to include 
a wide range of s, including some bins where the model prob- 
ability density p(s) is very low. When n exPi j is small, the \ 2 
distribution with its assumption of a Gaussian distribution 
for each rij severely underestimates the fluctuations in rij 
compared to the correct Poisson distribution. Thus, equa- 
tion §5§ can lead to major errors if n cxp j <C 1 in any bin. In 
this situatio n, the correct statistic to use is the C-statistic 
l| Cash! Il979l ) . derived from the Poisson distribution just as 
the x 2 statistic is derived from the Gaussian distribution. 
The C-statistic is defined as 



C 



N E 

' ^ ( n c Xp , ; 



(6) 



Note that the Poisson distribution also has a factor of rij! 
in the denominator, which results in an additional Inn,! 
term within the sum in equation ((6} , but this term does not 
depend on the model parameters (which enter only through 
n exPl j) and can thus be dropped from the minimization. 



motivated by sg as a measure for the signal and a j \J N p 
as a measure for the effective noise after N p measurements 
with noise a in each. Of course, the ability to detect the 
two separate peaks also depends on a, in that values close 
to or 1 make one of the peaks insignificant. For a fixed a, 
though, we might naively expect that the accuracy of the 
reconstructed values of sg and a would not change with the 
input value of sg, as long as we change N p so as to keep the 
combination 77 fixed. 

To test this, we fix the input a = 0.4 and sg = 1, 
and vary a and N p together so as to keep 77 fixed. We test 
rj — 400 and 4000, values comparable to those expected in 
the real experiments discussed later in the paper. The Monte 
Carlo results are summarized in Figure [1] The results show 
that the parameters can be accurately reconstructed as long 
as the signal-to-noise per sample (or per pixel in real data) 
sg/<t > 1. As long as this is the case, the relative error in 
sg and a is no worse than 4% (rj = 400) or 0.4% (77 = 
4000), and the average reconstructed values are essentially 
unbiased. However, once sg/c drops below unity (i.e., a > 1 
in this particular case), the errors increase rapidly with a, so 
that for 77 = 400 reconstruction is impossible when a — 10 
(i.e., both the bias and spread are of order unity) , and for 
rj — 4000 the errors increase when a = 4 to a 5% relative 
spread in a. 

The reason for these increasing errors is parameter de- 
generacy, as illustrated in Figure for rj = 400. While for 
(j = l the reconstructed parameter distribution is fairly sym- 
metrical about the input values of sg and a, resembling a 
standard error ellipse, larger a values produce a stretched 
error contour that displays a clear (partial) degeneracy be- 
tween the parameters sg and a. Intuitively, when sg/ct <C 1 
the PDF consists of a narrow input signal (two peaks sep- 
arated by sg, in the case of the toy model) convolved with 
a broad Gaussian of width a. The result is a broad Gaus- 
sian of width a, with small bumps (distortions). Apparently 
these small bumps can be produced with very different pa- 
rameter combinations, resulting in a degeneracy that leads 
to a large uncertainty when fitting models. While we have 
considered here a simple toy model, a similar degeneracy is 
encountered with the real 21-cm PDF, as discussed below. 
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Figure 1. For each model parameter x reconstructed in each 
Monte Carlo trial, we show the bias in the average (i.e., the en- 
semble average (x) minus the input value xj n ) and the standard 

deviation a x = */ {% 2 ) — (^) 2 . We consider the model parameters 
sq (solid curves, input value 1) and a (dashed curves, input value 
0.4), as a function of the noise level (i.e., width of each Gaussian) 
a, where r) is held fixed at 400 (left panels) or 4000 (right panels). 
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Figure 2. Distribution of reconstructed model parameters sg 
and a in 1000 Monte Carlo simulations. The input parameter 
values are sq = 1 and a = 0.4. We vary a keeping r\ = 400 fixed, 
so that the number of samples is N p = 160, 000 a 2 . Different 
panels cover different x ranges, but all x axes are shown on the 
same scale for easy comparison. In the a $J 2 panels, small tick 
marks are at 0.75 and 1.25. 



3 THE 21-CM PDF IN SIMULATIONS 
3.1 Numerical Simulation 

In this paper we utilize a large-scale N-body and radia- 
tive transfer simulation of cos mic reionization fol lowing the 
methodology first presented in lllievet al.l(|2006bl) . The cos- 
mological structure formation and evolution is fo llowed with 
a par ticle-mesh N-body code called PMFAST (|Merz et all 
120051 ). These N-body results then provide the evolving den- 
sity field of the IGM, as well as the location and mass of 
all the halo sources, as input to a separate radiative trans- 
fer simulation of inhomogeneous reionization. The latter is 
performed with the C 2 — Ray (Conservative, Causal Ray- 
Tracing) code, a regular-grid, ray-tr acing, radiative transfe r 
and nonequilibrium chemistry code l|Mellema et al-lfeOQGal ). 
The ionizing radiation is ray-traced from every source cell 
to every grid cell at a given timestep using a method of 
short characteristics. C 2 — Ray is designed to be explicitly 
photon-conserving in both space and time, which ensures an 
accurate tracking of ionization fronts, independently of the 
spatial and time resolution. This is true even for grid cells 
which are very optically thick to ionizing photons and time 
steps long compared to the ionization time of the atoms, 
which results in high efficie ncy. The code has be en tested 
against analytical solutions (jMellema et aLlfeoOGal ). and di- 
rectly compared with other radiative tran sfer methods on a 
stand ardized set of benchmark problems (|lliev et alj|2006al . 
l200Sl ). 

We simulated the ACDM universe with 1624 3 dark mat- 
ter particles of mass 2.2 x 10 7 M Q , in a comoving simulation 
volume of (100 h~ x Mpc) 3 . This allowed us to resolve (with 
100 particles or more per halo) all halos of mass 2.2 x 10 9 M© 
and above. The radiative transfer grid has 203 3 cells. The 
H-ionizing photon luminosities per halo in our cosmic reion- 
ization simulations are assigned as follows. A halo of mass 
M is assumed to have converted a mass M-(Qb/Q m )-f* into 
stars, where /„ is the star formation efficiency. Halo catalogs 
are discrete in time, because N-body density fields are stored 
every At ~ 20 Myrs and the corresponding halo catalogs are 
produced at the same time. If each source forms stars over 
a period of time At and each stellar nucleuf] produces Ni 
ionizing photons per stellar lifetime and is used only once 
per At, and if a fraction / csc of these photons escape into 
the IGM, then the ionizing photon number luminosity of a 
halo of mass M is given by 



Ni ■ /esc • /* • M (O t /fi ro ) 



At- 



/j,mn 



(8) 



where run is the mass of a hydrogen atom and /1 = 1.22 
so that (BiH is the mean mass per nucleus. In this model, 
stars are produced in a burst, and they keep radiating with 
a fixed Qi for At ~ 20 Myrs. We cho ose here a s p ecific case, 
first presented (an d labeled f250) in llliev et ail (|2007l ) and 
further discussed in llliev et al.l l|2008l ). In this scenario, halos 
are assumed to host relatively low efficiency emitters, with 
/ 7 = /i/oscAi = 250 (corresponding, e.g., to Pop II stars 
with a Salpeter IMF). 

The simulation we use in this work assumes a flat 



1 Note that we defined this number per atomic nucleus rather 
than per baryon in stars. 
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(ilfe = 0) ACDM cosmology. The simulation is based on 
the WMAP 3-year results, which derived the parameters 
(QmMAMb.h as^n ) = (0.24,0.76,0.042,0.73,0.74,0.95) 
jSpergel et all [2007). Here fi m , S7a, and fib are the total 
matter, vacuum, and baryonic densities in units of the crit- 
ical density, as is the root-mean-square density fluctuation 
on the scale of 8/i _1 Mpc linearly extrapolated to the present, 
and n is the power-law index of the primordial power spec- 
trum of density fluctuations. 



3.2 The Simulated 21-cm PDF 

During cosmic reionization, we assume that there are suffi- 
cient radiation backgrounds of X-rays and of Lya photons so 
that the cosmic gas has been heated to well above the cosmic 
microwave background temperature and the 21-cm level oc- 
cupations have come into equilibrium with the gas tempera- 
ture. In this case, the observed 21-cm differential brightness 
temperature (i.e., relative to the cosmic microwave back- 
ground) is independent of the spin temperatur e and, for our 
assum ed cosmological parameters, is given by ijMadau et all 
1 19971 ) 

yam-* •« 

with $ = x n [l + 8], where x n is the neutral hydrogen frac- 
tion and S is the relative density fluctuation. Under these 
conditions, the 21-cm fluctuations are thus determined by 
fluctuations in ^. We denote the PDF by p(Tb), normalized 
so that Jp(T b )dT b = 1. 

To calculate the 21-cm PDF, we smooth the 21-cm emis- 
sion intensity over our full simulation volume with a cubical 
top-hat filter (sometimes referred to as "boxcar" averaging) 
of a pre-determined size R p i x . We then assemble the PDF of 
the resulting values over a fine grid, much finer than _R p i x . 
This effectively smooths out the fluctuations in the PDF and 
yields a smooth function, but we note that there is still a real 
sample (or "cosmic") variance limit on the accuracy of our 
simulated PDF, resulting from the limited number of inde- 
pendent volumes of size 7? p j x within our simulation box. We 
use .Rpi x = 5/i _1 Mpc, 10 h' 1 Mpc, and 20 h' 1 Mpc, yield- 
ing a number of independent volumes equal to 8000, 1000, 
and 125, respectively. The analogous results for the first- 
year WMAP cos mology were pre v iously presented, for a few 
redshifts only, in iMellema et all (l2006b|) (with a similarly- 
defined ionized fraction PDF shown in Tiliev et all (|2006bh ). 

Figure [3] shows the overall progress of reionization as a 
function of redshift in the simulation. We calculate the PDF 
at 26 redshifts spanning a global mass-weighted ionization 
fraction Xi from 6 x 10 -6 to 99.0%, with the cosmic mean 
21-cm differential brightness temperature T b ranging from 
36.5 mK to 0.27 mK. Of course, we assume that T\ itself is 
not directly observable, due to the bright foregrounds. The 
main goal of the PDF analysis is to reconstruct Xi vs. z using 
the Tt fluctuations as captured in the PDF at each redshift. 

We show the measured simulation PDFs for various red- 
shifts and i? pix = 5 ft, _1 Mpc in Figure g] The PDF starts out 
close to Gaussian at high redshift, when the ionized volume 
is negligible and the density fluctuations on the scale 7? p j x 
are fairly linear and thus give a Gaussian PDF. There is also 
a clear skewness, seen particularly in a high-density tail that 
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Figure 3. The global progress of cosmic reionization in the sim- 
ulation, as a function of the redshift z. Bottom panel: we show 
the mass- weighted ionized fraction x i (solid curve) and the corre- 
sponding neutral fraction x n = 1 — Si (dashed curve). Top panel: 
we show the cosmic mean 21-cm differential brightness tempera- 
ture Tj, in the simulation (solid curve) , and the mean Tt, expected 
for a neutral universe of uniform density (dotted curve). Also 
indicated in each panel are the 26 output redshifts used in the 
analysis below (points). 



drops more slowly with TJ, than the Gaussian fit (more on 
the fitting function below); this results from the non- linear 
growth of density fluctuations. 

As reionization gets under way, the high-density tail 
drops off and (coincidental!/) approaches the Gaussian 
shape, as high-density pixels are more likely to be partially 
or fully ionized and thus have their Tb reduced. When Xi 
reaches a fraction of a percent, the still fairly Gaussian PDF 
develops a significant low-T(, tail which is roughly exponen- 
tial (i.e., linear in the plot of log of the PDF). This tail cor- 
responds to pixels that are substantially ionized, i.e., where 
a large fraction of the pixel volume partially overlaps one 
or more ionized bubbles. Soon afterward, a significant peak 
can be seen near Tb — mK, corresponding to fully ion- 
ized pixels (i.e., pixels in which the hydrogen in the IGM 
has been fully ionized, but there may remain a small bit 
of high-density neutral gas). Near the mid-point of reion- 
ization (xi — 50%), there is still a half-Gaussian peak (at 
Tb ~ 20 mK), i.e., with a Gaussian drop-off towards higher 
Tb, now with a nearly flat exponential tail towards lower 
Tt, and a prominent peak at Tb = mK. The peak at zero 
increasingly dominates towards the end of reionization, as 
most pixels become fully ionized, but there remains an ex- 
ponential tail out to higher Tb, with a cutoff (at Tb ~ 20 
mK). 

The PDFs are shown for 7? pix = 10 /T 1 Mpc and 
20 /i _1 Mpc in Figure^] The qualitative evolution of the PDF 
throughout reionization is similar to the _R p i x = 5/t -1 Mpc 
case, but the PDF is narrower for larger 7? p i x since the 21- 
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Figure 4. The 21-cm PDF in 5h~ Mpc cubic pixels, shown ver- 
sus the differential brightness temperature T5. We show log 10 of 
the PDF, which itself is expressed in units of 1/mK. We show the 
PDF obtained from the simulation (alternating solid and dotted 
curves) and our best fits to them (alternating long-dashed and 
short-dashed curves). The 26 redshifts (see Figure [3j range from 
2 = 15.729 (top) to 7.460 (bottom). The highest-redshift PDF is 
shown at its actual value, corresponding to the labels at the top 
of the y-axis; each subsequent PDF is shifted vertically down by 
a factor of 10 in the PDF. The X mark points (where T b equals 
the best-fit Tf,) on three simulated PDFs: early in reionization 
(2 = 10.08, Si = 0.156), right after the midpoint (2 = 8.79, 
&i = 0.530), and late in reionization (2 = 7.75, S, = 0.948); these 
points mark the 12-redshift range that is used in the fitting of 
mock data in the following sections. 



cm fluctuations are smaller when smoothed on larger scales. 
Also, for larger i? p i x there are fewer pixels in the peak near 
Tb = mK since it is more difficult to fully ionize large pix- 
els. The PDFs for i? pix = 20 /i _1 Mpc are not so reliable, as 
they are measured based on only 125 independent volumes. 
Also, their shapes differ significantly from the PDFs in the 
smaller pixels, and so they cannot be successfully fitted with 
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Figure 5. Same as Figure[4]but for cubic pixels of size 10 ft Mpc 
(left panel) or 20 ft -1 Mpc (right panel). In the right panel we show 
only the simulated PDFs, and the X marks the peak of the PDF 
right after the midpoint of reionization (2 = 8.79). 



the same model used for the other PDFs. Thus, in this pa- 
per we focus on the two smaller values of i? p i x and only 
present fits to the corresponding PDFs. Observations of the 
PDF are most promising during the central stage of reion- 
ization, when the PDF has two significant, well-separated 
peaks rather than a single narrow peak (as is the case ei- 
ther very early or very late in reionization). This two-peak 
regime covers Xi ~ 30 — 90% for _R p i x = 5 /i~ 1 Mpc, but only 
Xi ~ 75 — 95% for i? p i x = 10 /i _1 Mpc, because of the rarity of 
fully ionized pixels in the latter case. However, even without 
a strong peak at zero, the extended nearly flat (exponential) 
part of the PDF during reionization helps in measuring the 
PDF, as we find below. 



3.3 The GED Model Fit to the Simulated PDF 

Previous analytical models of the PDF do not describe our 
simulated PDFs well. While the Gaussian at high redshift 
and the T& = mK Delta function at the end of reioniza- 
tion are obvious, the precise shape at intermediate redshifts 
seems to depend on the precise topology of the ionized bub- 
bles and the geometry of their overlap with the cubic pixels. 
Here we take an empirical approach based on our numer- 
ical simulation. Thus we use a Gaussian + Exponential + 
Delta function (GED) model for the PDF p(T b ). The Dirac 
Delta function is centered at zero, and is connected with an 
exponential to the Gaussian. The model depends on four in- 
dependent parameters: Tg (center of Gaussian), a a (width 
of Gaussian), cg (height of Gaussian peak) and Tl (transi- 
tion point between the exponential and the Gaussian). Our 
GED model is thus: 
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Pi(T b ) = P D 5 D (T b ) + a exp(A7V 



p(T b ) = 



cg exp 



(T b - TgY 
2a% 



< T b T t 
T b > T L 



where Sd(x) is the Dirac delta function. The quantities a and 
A can be expressed in terms of the above four parameters by 
requiring the exponential and Gaussian functions to connect 
smoothly at T b — Tl- The conditions pi(Tl) = Pz(Tl) and 
Pi(Tl) =p' 2 {Tl) lead to 

, (ID 



A 



cg exp 



(T L - T G y 
2a G 



-XT L 



(12) 



Also, Pd is determined by the requirement of normalization; 
the total integrated probability is unity if Pd = 1 — Pe ~ Pg , 
where 



Pe 
Pg 



Pi(T b )dT b — — [exp(ATt) — 1] 



(13) 



P2{T b )dT„ = CG ./fa G erfc ( Tl T ° ) (14) 



Note that the parameters Pd, Pe and Pg represent the rel- 
ative contribution to the total probability from the delta 
function, the exponential function, and the Gaussian func- 
tion, respectively. 

Using the GED model, we determine the values of Tg, 
(jg, cg and Tl as functions of redshift by fitting to the sim- 
ulation PDFs for pixels of 5 /i _1 Mpc and 10 /i _1 Mpc. In ap- 
proaching this fitting, we note that we focus on the main 
features of the PDF, and not on the fine details. In partic- 
ular, we do not worry about features that contain a small 
fraction of the total probability, or on the detailed PDF 
shape on scales finer than several mK. This is justified since 
the observations are difficult, and most likely will not be 
sensitive to these fine details, at least in the upcoming 21- 
cm experiments. In addition, our simulated PDF may not 
be reliable in its fine details, since we are using a single, lim- 
ited simulated volume, and more generally, numerical simu- 
lations of reionization still lack a detailed demonstration of 
convergence. 

Thus, we do not try to fit the detailed peak shape at 
T b = 0, but instead represent the total probability of that 
region with the Delta function. In practice we only fit to the 
data beyond the lowest values of T b , and then set the Delta 
function contribution Pd to get the correct overall normal- 
ization. Specifically, for each PDF we first find T b ,n which is 
the highest value of T b where p(T b ,h) > 10 -4 . We then only 
fit to the data with T b > 5mK, if T b , h > 20 mK, or to the 
data with T b > T bih /4, if T b>h < 20 mK. At redshifts where 
the simulation data do not have a Delta function feature, i.e., 
there are no pixels near T b — 0, we make a fit constrained 
by setting Pd = 1 — Pe — Pg = 0; this is the case at the 
highest redshifts, namely z ^ 10.924 for i? p i x = 5ft Mpc 
and z > 9.034 for i? pix = 10/i _1 Mpc. 

Our GED model fits are shown along with the PDFs 
in Figs. [4] and [5] The fits are very good during the central 
and late stages of reionization, except for the detailed shape 
(which we do not try to fit) of the T b = peak which ex- 
tends out to T b ~ 2 — 4 mK. These are the redshifts that 
we focus on in this paper, and which are most promising to 



observe. The fits are also quite good at the highest redshifts, 
(10) where the simulated PDF is essentially Gaussian except for 
the skewness. This skewness, though, affects mainly the tails 
of the distribution; e.g., at the highest redshift (z = 15.729) 
for J?pi x = 5ft _1 Mpc, ~ 60% of the total probability is con- 
tained at T b values above the peak of the PDF, i.e., the 
high-density tail adds about 10% to the 50% of a symmetri- 
cal Gaussian. As noted above, this high-density tail declines 
with time due to ionization offsetting the high density of 
overdense pixels. Thus, the high- 7], tail becomes well fitted 
by the Gaussian model once Xi rises above a few percent. At 
later times the cutoff becomes somewhat steeper than the 
Gaussian fit, especially for i? p i x = 10/i _1 Mpc, but this only 
affects the insignificant tail end of the PDF at the highest 
T b . For instance, for -R p i x = 10/i _1 Mpc at Xi = 0.530, the 
tail beyond T b = 23 mK (where the cutoff starts to differ 
significantly from the fit) contains only 0.2% of the total 
probability. 

Another small mismatch occurs when reionization gets 
significantly under way but is still fairly early. The transition 
region from a near-Gaussian to a near-exponential shape is 
not well-fit at these times by our model, and as a result the 
fit is significantly below the low- 7],, roughly linear (expo- 
nential) tail. This mismatch is significant in the range of Xi 
from a few percent up to ~ 30%, and at these redshifts this 
exponential tail typically contains only a few percent of the 
total probability (up to 10%). 

Figure [6] shows how our model parameters vary as cos- 
mic reionization progresses. The Gaussian peak position Tg 
and height cg both decline with time due to the increas- 
ing ionization of even low-density pixels. At least a half- 
Gaussian is present until x% ~ 60%, but after that Tl > Tg 
and only the Gaussian cutoff remains. The parameter <jg 
remains at a value of a few mK throughout reionization; it 
gives a measure of density fluctuations, initially purely and 
later together with some correlation with ionization. At the 
very end of reionization, Tg — > and then gg and cg lose 
their usual meaning (e.g., cg becomes an indirect parame- 
terization of the normalization of the exponential portion) . 

Figure [7] shows the evolution of the probabilities Pd 
(representing the delta function), Pe (exponential), and Pg 
(Gaussian), which together add up to unity. The Figure 
shows how the 21-cm PDF is gradually transformed from 
a Gaussian to a delta function, with the exponential dom- 
inating at intermediate times (mid to late reionization). 
Note that in the limit of infinite resolution, we would have 
Pd = Xi- With a finite resolution, Pd can be thought of as 
the cosmic ionized fraction smoothed at the observed res- 
olution. In practice, converting observed values of Pd, Pe, 
and Pg to the true Xi requires some modeling. 

We also calculate the variance {T'i) - (T b ) 2 from the 
PDF both directly from the original simulation data and 
from our GED model fits. We plot this in Figure [8] for two 
reasons. First, the plot shows that the GED model repro- 
duces the variance of the real PDF rather well, especially 
where the upcoming measurements are more promising (i.e., 
later in reionization). Second, the Figure illustrates a sym- 
metry in that the variance is maximum near the midpoint of 
reionization, and has lower values both before and after the 
midpoint; this symmetry helps explain a near-degeneracy 
that we sometimes find below, when we consider low signal- 
to-noise data for which it is difficult to measure the detailed 
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Figure 6. Our best-fitting GED model parameters Tq (solid 
curve), Tl (long-dashed curve), oq (short-dashed curve), and cq 
(dotted curve, different t/-axis range) as functions of the cosmic 
mass- weighted ionization fraction. They are obtained by fitting to 
the simulated PDFs for pixels of size 5h _1 Mpc (bottom panel) 
or 10 h~ 1 fApc (top panel). 
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Figure 8. Standard deviation y/ {T 2 } — (Ti,) 2 as a function of the 
cosmic mass-weighted ionization fraction. We show this quantity 
for the original simulation data (solid curves) and from our GED 
model fits (dashed curves). We consider the PDF in boxes of 
size 5/i _1 Mpc, 10/i _1 Mpc and 20/i _1 Mpc (top to bottom, only 
simulation data for the 20/i _1 Mpc case). 
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Figure 7. The derived probabilities Pp (solid curve), Pe (short- 
dashed curve) and Pq (long-dashed curve) as functions of the 
cosmic mass-weighted ionization fraction. They are obtained by 
fitting the GED model to the simulation PDFs for pixels of size 
5/i _1 Mpc (bottom panel) or 10 h~ 1 Mpc (top panel). 



shape of the PDF, and the variance is a major part of what 
can be measured. 



4 MONTE-CARLO RESULTS WITH ONE 
FREE PARAMETER 

In the rest of this paper, we present results for the expected 
accuracy of reconstructing the 21-cm PDF itself and the cos- 
mic reionization history from the PDF. To obtain these re- 
sults, we assume that our simulation accurately reproduces 
the real reionization process in the universe, and furthermore 
we assume that our GED model introduced in the previous 
section can be used as a substitute for the PDF from the sim- 
ulation. In the future, more realistic simulations and more 
elaborate PDF fits can be used instead, but the general idea 
will be the same: as long as the overall signal-to-noise ratio 
is low, it is essential to rely on simulations in order to both 
reconstruct and interpret the observed PDF. 

Of course, even if simulations perfectly predicted the 
21-cm PDF for given inputs, various astrophysical scenar- 
ios would give somewhat different ionizing source and sink 
properties, and might yield a variety of possible PDFs. We 
leave the detailed exploration of this issue for future work, 
and here assume that the simulated scenario matches reality, 
except that a small number of free parameters are allowed 
to vary and must be reconstructed by trying to match the 
observed PDF. In this section, we reconstruct reionization 
from the PDF under the most optimistic assumption, where 
we assume that the real PDF matches the simulated one as 
a function of just a single parameter, the ionization frac- 
tion Xi. Thus, at each redshift, we find the value of Xi that 
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best matches the observed PDF, assuming that the PDF 
varies with Xi as in the simulation. In practice we expect 
that Xi is indeed the main parameter that determines the 
PDF, but there should be some small additional dependence 
on redshift and astrophysical inputs. In the next section we 
explore a more flexible approach which makes much weaker 
assumptions. 

Thus, here we wish to know how well a certain experi- 
ment can determine Xi assuming this one-parameter model. 
An experiment is specified by a total number of pixels N p 
and a noise per pixel crjv. We can simulate an observed PDF 
from such an experiment at a given input x% by generat- 
ing N p data points from the PDF of equation {TO} and 
adding to each noise generated from a Gaussian distribution 
with standard deviation gm- The resulting Monte-Carlo- 
generated "observed" PDF is then compared, via the C- 
statistic of equation <(6j , to the model, which is equation (|10[) 
convolved with the Gaussian noise. This convolved function 
q{T b ) equals: 



q(T b ) = P D G(T b ,a N ) + qi (T b ) + q 2 (T b ) , 
where G is a Gaussian (eq.[2]), and 



91 (zy 



52 (t&; 



X 2 a% 

= ^expl^- 



+ XT b x 



(15) 



(16) 



erf 



\a 2 N + T b 
v/2i 



<7jV 



erf 



\a% — Tl +T b 



\/2, 



1 <tg J (Tb - T G f 



(17) 



erfc 



a 2 N (T L ~T G ) + a 2 G (T L -T b 



where u 2 = a G + a%. As noted above, in this section we 
regard q(T b ) as a one-parameter function of Xi, taking T G , 
a G , c G and Tl to be functions of Xi as shown in Figure [6] 
For clarity we denote the input, real cosmic ionized fraction 
simply Xi, while the free parameter which is the output of 
the fitting we denote x° ut . Note that we assume that the 
experimental setup is sufficiently well characterized that crjv 
is known and need not be varied in the fitting. Also note 
that while the various temperatures we have defined (Tb, Tl, 
and Tq) refer to the differential brightness temperature (i.e., 
mK refers to the absence of a cosmological signal) , in prac- 
tice, when the foregrounds as well as the cosmic microwave 
background are subtracted, the mean cosmological signal on 
the sky will be removed as well, since there is no easy way 
to separate out different contributions except through their 
fluctuations. Thus, as in section 12.11 we assume that the 
absolute T b cannot be measured, and in our fitting always 
measure T b with respect to its average value according to 
the PDF, both in each model and in each simulated data 
set. 

For the experimental specification, we adopt the 
(rough) expected parameters for one-year observations of 
a single field of view with the M WA. We use the relation s 
for 21-cm arrays from the review bv lFurlanetto et al.l (|2006h . 
adopting a net integration time ti nt — 1000 hours, a collect- 
ing area Atot ~ 7 x 10 3 m 2 , a field of view of nl6 2 deg 2 , and 
a total bandwidth Aftot = 6 MHz. Then assuming cubic 



pixels of comoving size r cam , we find 

N p = 6.0 x 10 6 ( rc ° m ^ 3 fi±£ 
p Kbh^Mpc J V 10 



(18) 



CTN = 200 



5h~ 1 Mpc 



In order to explore the dependence on the noise level, we also 
consider various specifications with lower noise in the same 
field of view, e.g., 1/2 the noise we denote as MWA/2 (which 
corresponds, e.g., to four- year data with the MWA), while 
1/10 the noise we denote as MWA/10 (which corresponds 
to the regime of larger, second-generation 21-cm arrays). 
Note that we include only Gaussian thermal noise, whose 
magnitude is determined by the receiver's system tempera- 
ture, which in turn is set by the sky's brightness tempera- 
ture which is dominate d by Galactic synchrotron emission 
l|Furlanetto et all 120061 ) . In particular this assumes perfect 
foreground removal from the 21-cm maps; we leave an anal- 
ysis of the effect of foreground residuals for future work. 

We note the following conversions between comoving 
distance and observational units of angular and frequency 
resolution: 



5/i _1 Mpc « 2.6' 



10 



0.37 MHz 



.(20) 



The diffraction limit of the MWA is several arcminutes, but 
its frequency resolution will be around 10 kHz. In principle, 
this allows a measurement of the PDF in skinny boxes (thin- 
ner in the redshift direction) rather than cubes. This would 
give us more points but with less signal in each, keeping the 
overall signal-to-noise ratio about the same. By accessing 
fluctuations on smaller scales, this skinny-box PDF would 
be somewhat broader than the cubic one but on the other 
hand, our quantitative results for the toy model above sug- 
gest that decreasing the signal-to-noise ratio per pixel in 
this way would have a strong tendency to introduce partial 
degeneracies. Thus, we do not expect this option to be pro- 
ductive (except in the cases when the errors in the cubic 
PDF are very small), and focus here on the simplest case of 
the 21-cm PDF measured in cubes. 

At each redshift, we generate 1000 Monte Carlo in- 
stances of observed PDFs and minimize the C-statistic to 
find the best-fitting model in each case. Results for MWA 
and MWA/2 errors are plotted in Figure [9j which shows 
that for first-generation experiments the larger (10/i -1 Mpc) 
boxes are much more promising, since the lower noise <tjv (by 
a factor of ~ 6) dominates despite the narrower PDF (com- 
pare Figs. |4] and [5} and smaller number of pixels N p (by 
a factor of 8). We note that lower noise is particularly im- 
portant in view of the partial degeneracy (demonstrated in 
section [2~3l for the toy model) that arises when <jn is greater 
than the characteristic width of the intrinsic PDF. The par- 
tial degeneracy is also apparent in comparing the MWA and 
MWA/2 cases, where at some Xi values, halving the errors 
crosses a degeneracy threshold and cuts the output uncer- 
tainty in a non-linear fashion. We caution that cases that are 
very near such a threshold may be susceptible to additional 
numerical errors. 

The same results are shown in Figure [10] in terms of 
relative errors, making it easier to see and compare both 
small and large errors. Specifically, in terms of the various 
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Figure 9. Expected la errors on reconstructing the cosmic mean 
ionized fraction from the PDF, assuming just one free parameter. 
Specifically, for each input value of Xi we show the output median 
(i.e., 50 percentile) x° ut as well as the 16 to 84 percentile range. 
We consider MWA 1-yr errors (left panels) or MWA/2 (right pan- 
els), for the PDF in 5/i _1 Mpc boxes (top panels) or 10 h~ 1 Mpc 
boxes (bottom panels). 



Figure 10. Same as Figure [9] but showing relative errors (see 
text), for better visibility of cases with small errors. We show 
fo (absolute value shown, where negative values are open circles 
and positive values are solid circles) , /+ (+ symbols), and /_ (— 
symbols). We consider MWA 1-yr errors (left panels) or MWA/2 
(right panels), for the PDF in 5/i _1 Mpc boxes (top panels) or 
10 h~ 1 Mpc boxes (bottom panels). 



percentile output ionization fractions (e.g., we denote the 
median by x° ut ' 50 ), we show fo = (x° ut ' 5 ° /x~i) — 1 (the rel- 
ative difference between the median output value and the 
true input value, representing the fractional bias of the re- 
construction), /+ = (i° ut,84 /a;° ut ' j0 ) — 1 (the relative dif- 
ference between the 84% and median values, representive 
the fractional +la spread), and /_ = 1 - (x° ut ' 16 /x° ut - 50 ) 
(the relative difference between the 16% and median values, 
representive the fractional — la spread). The Figure shows 
that the reconstruction is typically unbiased within the er- 
rors (i.e., the la range is significantly larger than the bias 
in the median), except for some points in the early stages of 
reionization. Only a little information is available with the 
PDF in the smaller boxes (except for a few redshifts with 
MWA/2 errors); typically the error ranges are smaller near 
the mid-point of reionization, partly due to the fact (see Fig- 
ure [HJ that the variance of the PDF suffices to distinguish 
the mid-point of reionization from its two ends, but the early 
and late stages are degenerate with each other in terms of 
the variance. A rather good measurement of the reionization 
history is expected with 10 /i -1 Mpc boxes, in the mid to late 
stages of reionization, down to 1% errors in measuring the 
cosmic mean ionized fraction (or even better with MWA/2 
errors). When the errors are small, the measurement is un- 
biased and has symmetric error bars. 

As shown in Figure 1111 lower errors (approaching 
second-generation experiments) would avoid the degeneracy 
and allow a meaningful measurement of the cosmic reion- 
ization history even with the PDF in the smaller boxes, but 
10 ft _1 Mpc boxes always give a more precisely measured out- 
put value by about an order of magnitude. The expected 



success in reconstructing the reionization history under the 
strict assumption of a single free parameter motivates us to 
consider in the following section a more flexible reconstruc- 
tion method. 



5 MONTE-CARLO RESULTS WITH A 

FLEXIBLE FOUR-PARAMETER MODEL 

In the previous section we showed the expected accuracy of 
reconstructing the cosmic reionization history from the 21- 
cm PDF, assuming the PDF shape is known as a function of 
the cosmic mean ionized fraction. In this section we drop the 
latter assumption and present results for the expected accu- 
racy of reconstructing the detailed shape of the 21-cm PDF 
directly from the data. We focus on the regime of second- 
generation experiments, since the expected MWA errors do 
not allow such a reconstruction. Even with the lower errors, 
the PDF cannot be reconstructed parameter free, so we as- 
sume that our four-parameter GED model from section [3731 
correctly describes the real intrinsic PDF (an assumption 
which is explicitly true in our Monte-Carlo setup). Other- 
wise we do not assume any restrictions, and allow the four 
parameters of the model to vary freely when fitting (again 
by minimizing the C-statistic) to the noisy mock PDF data. 

Specifically, we fit the four parameters Tg, Tl, aa, and 
cg- We consider fitting the PDF in 5 h~ Mpc boxes with 
MWA/10 or MWA/20 errors. Figure [12] shows that signifi- 
cant information can be reconstructed with MWA/10 errors, 
although the errors in the reconstructed parameters are usu- 
ally fairly large (with particular failures at the early stage 
of reionization). The derived total probabilities of the GED 
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Figure 11. Same as Figure flOl but we consider MWA/4 errors 
(left panels) or MWA/10 (right panels), for the PDF in 5 h^Mpc 
boxes (top panels) or 10/i _1 Mpc boxes (bottom panels). 



model components are shown in Figure 1131 in particular, 
the statistically significant measurement of the evolution of 
Pd (which is the cosmic ionized fraction smoothed over the 
5/i _1 Mpc resolution) shows that significant information can 
be extracted about the cosmic reionization history, even in 
this more flexible fitting approach. 

Since the errors on the reconstructed parameters with 
MWA/10 noise are still mostly of order unity, we explored 
further and found that MWA/20 is necessary to break most 
of the degeneracies. Figure [T4l shows that in this case the pa- 
rameters can usually be reconstructed to 1 — 10% accuracy 
(with symmetric error bars and insignificant bias). Specifi- 
cally we show the four quantities Pd , Pg , Tq and og , which 
together comprise a complete set that specifies the GED 
model. Note that the measurement of Pd is particularly pre- 
cise, during the latter stages of reionization. 

As in the previous section, the PDF in larger, 
10 h~ 1 Mpc boxes, is easier to measure, due to the lower noise 
per pixel. Thus, here we consider somewhat larger noise lev- 
els, MWA/5 and MWA/10, with results shown in Figs. \T5\ 
and 1161 Note that the last (highest Xi) point in Tg is not 
shown, since the input Tg there is zero (see Figure [BJ, and 
also, we show Pd only during late reionization, where it is 
non-zero (see Figure [7]), and Pe at earlier times. While the 
errors are fairly large with MWA/5 errors, they reach the 
1 — 10% level with MWA/10, corresponding to a second- 
generation 21-cm experiment. 



6 CONCLUSIONS 

We have carried out a detailed quantitative analysis of 
whether upcoming and future experiments can measure the 
shape of the 21-cm PDF and derive from it the cosmic reion- 
ization history. This is an important question since the PDF 



Figure 12. Expected la errors on reconstructing the PDF pa- 
rameters assuming the four-parameter GED model, assuming 
MWA/10 errors on the PDF in 5/i -1 Mpc boxes. We show the 
16, 50, and 84 percentiles, as before, and also the assumed input 
values (circles). 
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Figure 13. Expected lcr errors on reconstructing the derived 
probabilities of the GED model, from a four-parameter fit to the 
PDF, assuming MWA/10 errors and 5/i _1 Mpc boxes. We show 
the 16, 50, and 84 percentiles, as before, and also the assumed 
input values (circles). 
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Figure 14. Expected la errors on reconstructing various quan- 
tities of the GED model, from a four-parameter fit to the PDF, 
assuming MWA/20 errors and 5h~ x Mpc boxes. As in the previ- 
ous section, we show the relative errors fo (absolute value shown, 
where negative values are open circles and positive values are solid 
circles) , /+ (+ symbols), and /_ (— symbols). 



Figure 16. Expected lc errors on reconstructing various quan- 
tities of the GED model, from a four-parameter fit to the PDF, 
assuming MWA/10 errors and 10h _1 Mpc boxes. We show the 
relative errors fo (absolute value shown, where negative values 
are open circles and positive values are solid circles) , /+ (+ sym- 
bols), and /_ (— symbols). 
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Figure 15. Expected lcr errors on reconstructing various quan- 
tities of the GED model, from a four-parameter fit to the PDF, 
assuming MWA/5 errors and 10 h~ 1 Mpc boxes. We show the rel- 
ative errors /o (absolute value shown, where negative values are 
open circles and positive values are solid circles) , /_)_ (+ symbols), 
and /_ (— symbols). 



during reionization is highly non-Gaussian, it directly pro- 
vides important information such as the cosmic ionization 
fraction at each redshift (though smoothed on the experi- 
mental resolution scale), and is potentially a way to derive 
the cosmic reionization history independently of the stan- 
dard power spectrum analysis. 

We developed a maximum-likelihood approach that 
achieves maximum efficiency by minimizing the C-statistic 
(eq. poj) applied to binned PDF data. We used a toy PDF 
model of two Gaussians (eq. [1} to show that the simplistic 
notion of signal-to- noise ratio (eq. [7]) does not fully describe 
the ability to extract the PDF out of noisy data. Instead, 
once the noise per pixel rises above a few times the signal 
(i.e., the width of the intrinsic PDF), the errors blow up due 
to a strong degeneracy, even if the total signal-to-noise ratio 
is kept fixed by increasing the number of pixels (Figs. [1] and 

We measured the 21-cm PDF as a function of redshift 
in a large-scale N-body and radiative transfer simulation of 
cosmic reionization (Figs. [4] and [5J. The PDF starts out 
close to Gaussian at high redshift, due to still-linear density 
fluctuations, later develops an exponential tail at low Tb, 
and finally becomes strongly peaked at zero towards the 
end of reionization. We empirically fit the PDF from the 
simulation with a four-parameter Gaussian + Exponential 
+ Delta function (GED) model fea.flUl Figs. |6] and 0. 

Assuming the simulations as a reliable guide for the 
evolution of the PDF, we quantitatively explored how well 
parameters can be measured with two different approaches. 
In the most optimistic approach, we assumed that the real 
PDF matches the simulated one as a function of just a sin- 
gle free parameter, the ionization fraction Xi, and tried to 
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reconstruct this parameter from noisy mock data. We found 
that first-generation experiments (such as the MWA) are 
promising, at least if relatively large (10/i^ 1 Mpc) pixels are 
used along with their relatively low noise level per pixel. 
Specifically, a rather good measurement of the reionization 
history is expected in the mid to late stages of reionization, 
down to 1% errors in measuring the cosmic mean ionized 
fraction. 

We also considered reconstructing the cosmic reioniza- 
tion history together with the PDF shape, all while assuming 
that the four-parameter GED model correctly describes the 
real intrinsic PDF, but allowing the four parameters to vary 
freely when fitting mock data at each redshift. We found that 
this flexible approach requires much lower noise levels, char- 
acteristic of second-generation 21-cm experiments, to reach 
the level of 1 — 10% accuracy in measuring the parameters 
of the 21-cm PDF. 

We note that cosmic reionization ends in our simulation 
at redshift 7.5 (Fig. [3}. If reionization in the real universe 
ends later (e.g., closer to z = 6.5), then observations will 
be somewhat easier than we have assumed, due to the re- 
duced foregrounds at lower redshift. On the simulation side, 
further work is necessary to establish the numerical con- 
vergence of the simulated 21-cm PDF during reionization, 
and to explore the dependence of the PDF on various astro- 
physical scenarios for the ionizing sources and sinks during 
reionization. This further effort is warranted since we have 
shown that the 21-cm PDF is a promising alternative to the 
power spectrum which can independently probe the cosmic 
reionization history. 
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